Effect of look-ahead search depth in learning position evaluation functions for Othello using epsilon-greedy exploration

نویسندگان

  • Thomas Philip Runarsson
  • Egill O. Jonsson
چکیده

This paper studies the effect of varying the depth of look-ahead for heuristic search in temporal difference (TD) learning and game playing. The acquisition position evaluation functions for the game of Othello is studied. The paper provides important insights into the strengths and weaknesses of using different search depths during learning when 2-greedy exploration is applied. The main findings are that contrary to popular belief, for Othello, better playing strategies are found when TD learning is applied with lower look-ahead search depths.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Avoiding the Look-Ahead Pathology of Decision Tree Learning

Most decision-tree induction algorithms are using a local greedy strategy, where a leaf is always split on the best attribute according to a given attribute selection criterion. A more accurate model could possibly be found by looking ahead for alternative subtrees. However, some researchers argue that the look-ahead should not be used due to a negative effect (called ―decision tree pathology‖)...

متن کامل

Improving heuristic mini-max search by supervised learning

This article surveys three techniques for enhancing heuristic game-tree search pioneered in the author’s Othello program LOGISTELLO, which dominated the computer Othello scene for several years and won against the human World-champion 6–0 in 1997. First, a generalized linear evaluation model (GLEM) is described that combines conjunctions of Boolean features linearly. This approach allows an aut...

متن کامل

Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration

Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action repeated general-sum games. In particular, ...

متن کامل

Dynamic Locomotion Skills for Obstacle Sequences Using Reinforcement Learning

Most locomotion control strategies are developed for flat terrain. We explore the use of reinforcement learning to develop motor skills for the highly dynamic traversal of terrains having sequences of gaps, walls, and steps. Results are demonstrated using simulations of a 21-link planar dog and a 7-link planar biped. Our approach is characterized by: non-parametric representation of the value f...

متن کامل

Giraffe: Using Deep Reinforcement Learning to Play Chess

This report presents Giraffe, a chess engine that uses self-play to discover all its domain-specific knowledge, with minimal hand-crafted knowledge given by the programmer. Unlike previous attempts using machine learning only to perform parametertuning on hand-crafted evaluation functions, Giraffe’s learning system also performs automatic feature extraction and pattern recognition. The trained ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007